Privacy-Preserving Boosting with Random Linear Classifiers for Learning from User-Generated Data
نویسندگان
چکیده
User-generated data is crucial to predictive modeling in many applications. With a web/mobile/wearable interface, an online service provider (SP) can continuously record user-generated data and depend on various predictive models learned from the data to improve their services and revenue. SPs owning the large collection of user-generated data has raised privacy concerns. We present a privacy-preserving framework, SecureBoost, which allows users to submit encrypted or randomly masked data to SP who learn only prediction models but nothing else. Our framework utilizes random linear classifiers (RLCs) as the base classifiers in the boosting framework to simplify the design of privacy-preserving protocol. A Cryptographic Service Provider (CSP) is used to assist SP's processing, reducing the complexity of the protocol constructions while the leakage of information to CSP is limited. We present two constructions of SecureBoost: HE+GC and SecSh+GC, using combinations of homomorphic encryption, garbled circuits, and random masking to achieve both security and efficiency. We have conducted extensive experiments to understand the quality of the RLC-based boosting and the cost distribution of the constructions. The result shows that SecureBoost efficiently learns high-quality boosting models from protected user-generated data.
منابع مشابه
Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning
Numerous tools have been developed to aggressively block the execution of popular JavaScript programs in Web browsers. Such blocking also affects functionality of webpages and impairs user experience. As a consequence, many privacy preserving tools that have been developed to limit online tracking, often executed via JavaScript programs, may suffer from poor performance and limited uptake. A me...
متن کاملTowards Attack-Resilient Geometric Data Perturbation
Data perturbation is a popular technique for privacypreserving data mining. The major challenge of data perturbation is balancing privacy protection and data quality, which are normally considered as a pair of contradictive factors. We propose that selectively preserving only the task/model specific information in perturbation would improve the balance. Geometric data perturbation, consisting o...
متن کاملRademacher Observations, Private Data, and Boosting
The minimization of the logistic loss is a popular approach to batch supervised learning. Our paper starts from the surprising observation that, when fitting linear (or kernelized) classifiers, the minimization of the logistic loss is equivalent to the minimization of an exponential rado-loss computed (i) over transformed data that we call Rademacher observations (rados), and (ii) over the same...
متن کاملDo we need hundreds of classifiers to solve real world classification problems?
We evaluate 179 classifiers arising from 17 families (discriminant analysis, Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearestneighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regre...
متن کاملDifferentially Private Empirical Risk Minimization
Privacy-preserving machine learning algorithms are crucial for the increasingly common setting in which personal data, such as medical or financial records, are analyzed. We provide general techniques to produce privacy-preserving approximations of classifiers learned via (regularized) empirical risk minimization (ERM). These algorithms are private under the ε-differential privacy definition du...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1802.08288 شماره
صفحات -
تاریخ انتشار 2018